Portable Language Technology: Russian via Czech
نویسندگان
چکیده
We report on morphological tagging of Russian using very limited Russian resources. We train the TnT tagger (Brants, 2000) on a modified Czech corpus to get the transition probabilities. We believe that the two languages are similar enough for the transitional information to be useful. The Russian emission symbols are obtained using a morphological analyzer that does not rely on a manually created lexicon. Finally, we report on several simple systematic modifications transforming a Czech text into a text with more Russian-like morphological properties.
منابع مشابه
Towards Parallel Czech-Russian Dependency Treebank
In this paper we describe initial steps in constructing a Czech-Russian dependency treebank and discuss the perspectives of its development. Following the experience of the Czech-English Parallel Treebank we have taken a syntactically annotated “gold standard” text for one language (Russian) and run an automatic annotation on the respective parallel text for the other language (Czech). Our tree...
متن کاملIncreasing the Effectiveness of Russian Language Teaching for Special Purposes (to the Problem of Integration of Language Training with Information Technology Courses)
The article is devoted to the problem of increasing the efficiency of language teaching for the special purposes of foreign students in studying Russian at a technical university. Particular attention is paid to the training of foreign students in the skills of working with information using the latest computer technology. The conclusions of the work are based on the analysis of the results of ...
متن کاملStatistical Machine Translation Between Related and Unrelated Languages
In this paper we describe an attempt to compare how relatedness of languages can influence the performance of statistical machine translation (SMT). We apply the Moses toolkit on the Czech-English-Russian corpus UMC 0.1 in order to train two translation systems: Russian-Czech and English-Czech. The quality of the translation is evaluated on an independent test set of 1000 sentences parallel in ...
متن کاملExperiments in Cross-Language Morphological Annotation Transfer
Annotated corpora are valuable resources for NLP which are often costly to create. We introduce a method for transferring annotation from a morphologically annotated corpus of a source language to a target language. Our approach assumes only that an unannotated text corpus exists for the target language and a simple textbook which describes the basic morphological properties of that language is...
متن کاملPortable Language Technology: a Resource-light Approach to Morpho-syntactic Tagging
Morpho-syntactic tagging is the process of assigning part of speech (POS), case, number, gender, and other morphological information to each word in a corpus. Morpho-syntactic tagging is an important step in natural language processing. Corpora that have been morphologically tagged are very useful both for linguistic research, e.g. finding instances or frequencies of particular constructions in...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004